Machine Learning

Ingeniería de software y computación

Ph.D. Pablo Eduardo Caicedo Rodríguez

2023-08-01

Presentación

El Profesor

Educación

Doctor en Ciencias de la Electrónica. Magíster en Ingeniería Electrónica y Telecomunicaciones Ingeniero en Electrónica y Telecomunicaciones

Intereses

Biomecánica, Dispositivos para el análisis de movimiento humano, ciencia de los datos.

Desempeño

Profesor de la Facultad de Ingeniería & Ciencias Naturales

Invest. Línea de Percep. Avanz. y Robótica – GITA

Director Grupo de Investigación MEDES.

Director del laboratorio de datos de la Uniautonoma.

Contacto:

pablo.caicedo.r@uniautonoma.edu.co

Contenido del curso

  1. Análisis exploratorio de datos
  2. Problemas de regresión
  3. Tópicos avanzados en clasificación

Evaluación

  1. Comprensión de lectura (Inglés) (10%)
  2. Consigna 001. Análisis exploratorio de datos (25%)
  3. Consigna 002. Problemas de Regresión (25%)
  4. Consigna 002. Proyecto Final (40%)

Recursos

Clases

Lunes, Martes, Jueves y Viernes 11:00 – 13:00 Sala 504

Sala de teams

Software

Interpretes: Python, R, Latex(TEXLive), Anaconda.

IDE: Visual Studio Code, Google Colaboratory (R, Python)

Librerías Pandas, Matplotlib, Seaborn, Keras, Tensorflow, Numpy, SciKit-Learn, SciPy

Seguimiento de Aprendizaje: Moodle

Bibliografía

  1. B. Boehmke y B. M. Greenwell, Hands-on machine learning with R. Boca Raton: CRC Press, 2019.

  2. G. Bonaccorso, Mastering machine learning algorithms: expert techniques to implement popular machine learning algorithms and fine-tune your models. 2018.

  3. M. Fenner, Machine learning in python for everyone. Boston, MA: Addison-Wesley, 2019.

  4. K. Kolodiazhnyi, Hands-On Machine Learning with C++ Build, Train, and Deploy End-To-end Machine Learning and Deep Learning Pipelines. Birmingham: Packt Publishing, Limited, 2020. Accedido: 28 de septiembre de 2021.

  5. M. Kubat, An Introduction to Machine Learning. Cham: Springer International Publishing, 2017. doi: 10.1007/978-3-319-63913-0.

  6. S. Raschka y V. Mirjalili, Python machine learning: machine learning and deep learning with Python, scikit-learn, and TensorFlow, Second edition, Fourth release,[fully revised and Updated]. Birmingham Mumbai: Packt Publishing, 04.

  7. S. Skansi, Introduction to Deep Learning: From Logical Calculus to Artificial Intelligence. Cham: Springer International Publishing, 2018. doi: 10.1007/978-3-319-73004-2.

Análisis exploratorio de datos

A little reminder …

A little reminder …

A little reminder …

A little reminder …

A little reminder …

Exploratory Data Analysis (EDA)

Definition

The art of looking the underlying structure of the information through one or more datasets.

Definition by Diaconis, P.

We look at numbers or graphs and try to find patterns. We pursue leads suggested by background information, imagination, patterns perceived, and experience with other data analyses.

Exploratory Data Analysis (EDA)

EDA, depends on two things:

  1. Type of variable scale (information type, categorical, numerical, continuous, discrete, etc).

  2. Objective and type of the analysis (graphical, numerical, correlation, etc)

Learning from the example

Spotify

“Spotify offers digital copyright restricted recorded music and podcasts, including more than 82 million songs, from record labels and media companies” from wikipedia

Spotify Dataset

The data set is located in the kaggle site. Dataset

Analysis Objective

Spotify wants to know if there is a relationship between the popularity of a song and the number of followers of its singers. The above to generate strategies to attract new singers to the platform.

Learning from the examples

General Workflow

  1. Import dataset.
  2. Preprocessing dataset.
  3. EDA on the datasets.
  4. Train the machine learning model.
  5. Predict the target using the trained model.

Import dataset.

General workflow for Importing a dataset in python

  1. Install conda environment manager

  2. Install a suitable conda environment.

  3. Install python libraries. At least, a machine learning project without deployment needs:

    1. Numpy
    2. Pandas
    3. Matplotlib
    4. Seaborn
    5. Scikit-Learn
    6. Jupyter
  4. Install a suitable IDE software.

  5. Script, script, script.

Import dataset

import pandas as pd
data_spotify = pd.read_csv(path_to_data + "/tracks.csv")
id name popularity duration_ms explicit artists id_artists release_date danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence tempo time_signature
0 35iwgR4jXetI318WEWsa1Q Carve 6 126903 0 ['Uli'] ['45tIt06XoI0Iio4LBEVpls'] 1922-02-22 0.645 0.4450 0 -13.338 1 0.4510 0.674 0.7440 0.151 0.127 104.851 3
1 021ht4sdgPcrDgSk7JTbKY Capítulo 2.16 - Banquero Anarquista 0 98200 0 ['Fernando Pessoa'] ['14jtPCOoNZwquk5wd9DxrY'] 1922-06-01 0.695 0.2630 0 -22.136 1 0.9570 0.797 0.0000 0.148 0.655 102.009 1
2 07A5yehtSnoedViJAZkNnc Vivo para Quererte - Remasterizado 0 181640 0 ['Ignacio Corsini'] ['5LiOoJbxVSAMkBS2fUm3X2'] 1922-03-21 0.434 0.1770 1 -21.180 1 0.0512 0.994 0.0218 0.212 0.457 130.418 5
3 08FmqUhxtyLTn6pAh6bk45 El Prisionero - Remasterizado 0 176907 0 ['Ignacio Corsini'] ['5LiOoJbxVSAMkBS2fUm3X2'] 1922-03-21 0.321 0.0946 7 -27.961 1 0.0504 0.995 0.9180 0.104 0.397 169.980 3
4 08y9GfoqCWfOGsKdwojr5e Lady of the Evening 0 163080 0 ['Dick Haymes'] ['3BiJGZsyX9sJchTqcSA7Su'] 1922 0.402 0.1580 3 -16.900 0 0.0390 0.989 0.1300 0.311 0.196 103.220 4

Problemas de regresión

Tópicos avanzados en clasificación